Graphical Representation of Textual Data Using Text Categorization System

نویسندگان

  • Akshay Kumar
  • Vibhor Harit
  • Balwant Singh
  • Manzoor Husain Dar
چکیده

This paper presents the graphical representation of textual data using text categorization; we had concentrated on the compact representation of the document. Text Categorization has become an important task in data mining (text mining) because of the development of electronic commerce over the internet. All organizations that have business based on internet need an effective categorization method for managing large amount of textual data which is available in various forms like sales orders, summary documents, emails, journals and memos etc. Here we have used both globalized as well as localized feature selection methods. The localized method that we have introduced has also improved the accuracy of the classifier. The classifier that we have used is K-NN that is K nearest neighbor. The K-NN is simple and is having better precision in classifying a document. Also this K-NN does not need any training resources or model to be built up and it categorizes on the fly .Therefore its cost is also less as no resources need to be trained and accuracy is also better than any other classifier.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

When Naïve is not Enough: Bringing Naïve Bayes Text Categorization to "Surface"

Since information has become more and more available in digital format, especially on the World Wide Web, organizing and classifying digital documents, making them accessible and presenting them in a proper way are becoming important issues. Digital Library Management Systems (DLMSs) are an example of systems that manage collections of multi-media digitalized data and include components that pe...

متن کامل

Opinion Mining in Hungarian based on textual and graphical clues

Opinion Mining aims at recognizing and categorizing or extracting opinions found in unstructured text resources and is one of the most dynamically evolving subdiscipline of Computational Linguistics showing some resemblance to document classification and information extraction tasks. In this paper we propose a novel approach in Opinion Mining which combines Machine Learning models based on trad...

متن کامل

Textual Data Representation

We address in this report the problem of representing formally textual data. First, this problem is replaced in the context of automatic text processing. Then, the weaknesses of the basic document representation, i.e. the bag-of-words representation, are explained and some state-ofthe-art methods claiming to overcome these weaknesses are reviewed. Moreover we propose a novel graphical model, th...

متن کامل

The Effect of Visual Representation, Textual Representation, and Glossing on Second Language Vocabulary Learning

In this study, the researcher chose three different vocabulary techniques (Visual Representation, Textual Enhancement, and Glossing) and compared them with traditional method of teaching vocabulary. 80 advanced EFL Learners were assigned as four intact groups (three experimental and one control group) through using a proficiency test and a vocabulary test as a pre-test. In the visual group, stu...

متن کامل

Textual Modelling Embedded into Graphical Modelling

Today’s graphical modelling languages, despite using symbols and connections, represent large model parts as structured text. We benefit from sophistic text editors, when we use programming languages, but we neglect the same technology, when we edit the textual parts of graphical models. Recent advances in generative engineering of textual model editors allow to create such sophisticated text e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014